Top-k String Auto-Completion with Synonyms

نویسندگان

  • Pengfei Xu
  • Jiaheng Lu
چکیده

Auto-completion is one of the most prominent features of modern information systems. The existing solutions of auto-completion provide the suggestions based on the beginning of the currently input character sequence (i.e. prefix). However, in many real applications, one entity often has synonyms or abbreviations. For example, “DBMS” is an abbreviation of “Database Management Systems”. In this paper, we study a novel type of auto-completion by using synonyms and abbreviations. We propose three trie-based algorithms to solve the top-k auto-completion with synonyms; each one with different space and time complexity trade-offs. Experiments on large-scale datasets show that it is possible to support effective and efficient synonym-based retrieval of completions of a million strings with thousands of synonyms rules at about a microsecond per-completion, while taking small space overhead (i.e. 160-200 bytes per string). The source code of our experiments can be download at http://udbms.cs.helsinki.fi/?projects/autocompletion/download.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Space-efficient data structures for Top-k completion

Virtually every modern search application, either desktop, web, or mobile, features some kind of query auto-completion. In its basic form, the problem consists in retrieving from a string set a small number of completions, i.e. strings beginning with a given prefix, that have the highest scores according to some static ranking. In this paper, we focus on the case where the string set is so larg...

متن کامل

Widen the Peepholes! Entity-based Auto-Suggestion as a rich and yet immediate Starting Point for Exploratory Search

Today’s search engines provide instant keyword-based auto-suggestion and completion of the user’s search queries. This paper presents a novel auto-suggestion interface for the Semantic Multimedia Explorer (SEMEX), a semantic search engine that supports entity-based exploratory video retrieval. In difference to traditional textbased retrieval, auto-suggestion and auto-completion of the user’s qu...

متن کامل

Piecewise Synonyms for Enhanced UMLS Source Terminology Integration

The UMLS contains more than 100 source vocabularies and is growing via the integration of others. When integrating a new source, the source terms already in the UMLS must first be found. The easiest approach to this is simple string matching. However, string matching usually does not find all concepts that should be found. A new methodology, based on the notion of piecewise synonyms, for enhanc...

متن کامل

Predicting Source Code Effectiveness of Prediction based Source Code Auto Completion

Auto Completion is the facility provided by most modern Integrated Development Environments and source code editors for word completion when editing source code. All auto completion mechanisms that we know of use syntactic knowledge of the programming language to provide this feature. We investigate the use of programming language agnostic prediction models to provide auto completion. We implem...

متن کامل

An Algorithm for Hypergraph Completion According to Hyperedge Replacement Grammars

The algorithm of Cocke, Younger, and Kasami is a dynamic programming technique well-known from string parsing. It has been adopted to hypergraphs successfully by Lautemann. Therewith, many practically relevant hypergraph languages generated by hyperedge replacement can be parsed in an acceptable time. In this paper we extend this algorithm by hypergraph completion: If necessary, appropriate fre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017